Bootstrapping

Simple Stock Returns

Simple Stock Returns

The essence of bootstrapping is that we don't know what the distribution of stock returns is, so we simply use historic stock returns as our sample space

This spreadsheet gives FTSE100 data

Use it to develop a Monte carlo method which prices an option on the FTSE100

Method
  • Pick a random day in the history of the data you have
  • Take the market return for that day
  • Apply this to your current simulated share price
  • Repeat for a long as you are simulating the share price behaviour
  • Run thousands of times to perform your Monte Carlo calculation
Issues

Can you think of a problem with this method

Using day on day return will be very slow

Using day on day returns means we will not get any excess kurtosis as the central limit theorem will apply. In other words we are ignoring any conditional Heteroscedasticity

Can you think of solutions

You could use a longer time period than a day, although this would naturally reduce the sample points

You could use variable lengths of time for each data point

You could use different lengths of time for the sample than you were using in the MC simulation and then scale them

You could make your random selection of bootstrapped data point dependent on where the previous one was chosen from, for example picking a date from + or - 10 days either side the previous data point used and effectively use a random walk around your historic data.

Making Bootstrapping more Realistic

We are going to assess the log returns of the FTSE100 data over 1 year periods.

We only have 34 data points but we can still measure the skew and kurtosis of the returns from this amount of data.

We need the following formulas

Sample Excess Kurtosis $=\frac{\frac{1}{n} \sum_{i=1}^n(x_i-\bar{x})^4}{\left(\frac{1}{n} \sum_{i=1}^n(x_i-\bar{x})^2\right)^2}-3$

Sample Skewness $=\frac{\frac{1}{n} \sum_{i=1}^n(x_i-\bar{x})^3}{\left[\frac{1}{n} \sum_{i=1}^n(x_i-\bar{x})^2\right]^{3/2}}$

Note: Ideally we would use the unbiased estimators at this point but given the reasonably large sample and the intrinsic lack of accuracy in this process the pure sample statistics will probably suffice

Exercise: Write VBA functions to calculate the mean standard deviation, skewness and kurtosis of a data set

Random Walk in Share Price History

This time instead of taking random days from the past we start with one random day from the past and then randomly choose an adjacent day

We continue this process taking the price moves on successive adjacent days either going forward of backwards

By running this simulation thousand of times you can measure the skew and kurtosis of the distribution so generated

Using a Longer Period

This time we will use randomly chosen days in the past but then take the return over a one month period.

This model will run quicker as as we onlty need 12 return values per one year simulation

Can you see a problem with this method

Yes - there are now only 12 times 34 independent data points to choose from

Can you also change your model so that you use whole year segments of the share price return

Ratioing up one days returns

This time we randomly choose days in the past and then ratio up the return from that day by an appropriate amount to simulate the return for a whole year

We could also do this for 12 separate months or even take a monthly return and ratio up by $\sqrt{12}$

What are we trying to achieve

Each time we perform the bootstrapping in a different way we are trying to recreate the features of the actual share price return distribution.

For each method there are a number of different parameters we can change and it makes sense to try out different methods until we have a realistic simulation of the actual share we are modelling (in our case the whole FTSE100

Bootstrapping the ODP (GI example)

Chain Ladder

How are they made

First we need to consider how policies written, accidents happening, being reported and claims being paid out are used to fill values into the various different run off triangles that we can use.

Once we have data in a triangle the mathematics of calculating the reserve is the same whatever the triangle represents

Notes

First we need to consider a timeline

A policy is sold (written in 2012)

The premium is then earned continuously over the next 12 months

An accident happens in 2013 during the term of the policy

In 2015 this accident is reported to the insurance company and the benefit is believed to be £450

In 2016 this claim is settled for £600

So where do these numbers go in the different triangles

First we consider IBNR by accident year

The bold line represent the current time at year end 2016

As the accident happened in 2013 and was reported in 2015 we can see this represents development year 3 for accident year 2013

What if we do an underwriting year triangle. Then we wish to consider when policies were written in respect of which accidents happen

This policy was written in 2012 so it is reported in development year 4

What about reported but not settled - this time we group accidents by the year in which they were reported

So this accident goes in reported year 2015 and is settled in development year 2

What about the paid triangle - this considers when claims were actually paid out

If this is grouped by accident year then this claim was paid in 2016 which is development year 4 for accident year 2013

But we can also do a paid triangle by underwriting year

This time the policy was written in 2012 for which the claim is finally paid out in 2016 that is year 5

Spreadsheets

There are a number of different spreadhseet you can look at to back up the calculation in this section of the course:

Classic triangulation methods Basic Reserving Calculations. This spreadsheet contains 6 years of data to illustrate the methods more clearly.

Simplified 4 year spreadsheet (suitable for hand calcs in lecture) Basic reserving (4 year).xls

Chain Ladder - Calculation Method

The chain ladder requires us to follow the steps below

Gather Data

Gather our data into a run-off triangle for whatever kind of reserve we are trying to calculate

IncrementalDevelopment
Accident year1234
2013 50 30 15 5
2014 60 40 25 -
2015 40 30 - -
2016 80 - - -
Cumulate Data

Then we sum along the rows to cumulate the data

Cumulative1234
2013 50 80 95 100
2014 60 100 125
2015 40 70
2016 80
Filling in the Blank Cells

The blank cells represent the future - that we do not yet know. The purpose of this process is to try and make as good an estimate as possible as to what is going to happen in the future

Can you guess a figure you might put in cell(2014,4)

Guess = $125 \times \frac{100}{95} = 132$

What about cell(2015,4)

We might be tempted to choose $70 \times \frac{100}{80}$, but this not be a good guess because we have not used the accident year 2014 data that we have.

Cell(2015,3) is more intuitive. This time we have two years of data which has been developed for 3 years so we can use both of these years to guess this cell.

Guess = $70 \times \frac{95+125}{80+100} = 86$

We can now see that:

the ratio of development year 4 to development year 3 is just $\frac{100}{95}$ and

the ratio of development year 3 to development year 2 is just $\frac{95+125}{80+100}$

These numbers are called the development factors and once we have calculated them for each development year we can use them to fill in the whole triangle

The following table sets out the calculation as you will often see in a spreadsheet as a convenient way of organising the data is to sum each column and then take the last value of when calculating the following year's development factor

Sum of column 230 250 220 100
Last value 80 70 125 100
Sum of column less last value 150 180 95 -
Dev factor1.66671.22221.0526

We often notate the development factors $f_{1,2}$ and $f_{2,3}$ etc.

We should note the relationship $f_{1,3} = f_{1,2} \times f_{2,3}$ etc and specifically:

$f_{1,n} = f_{1,2} \times f_{2,3} \times f_{3,4}... \times f_{n-1,n}$ and

$f_{2,n} = f_{2,3} \times f_{3,4}... \times f_{n-1,n}$ and so on

And so we can easily continue to finish off the run-off triangle:

Accident Year1234reserve
2013 50 80 95 100 -
2014 60 100 125 132 7
2015 40 70 86 90 20
2016 80 133 163 172 92
IBNR 118

For each accident year the reserve to be held is the projection to the end of the triangle MINUS the LAST piece of "hard" data for that year. In the case of IBNR - this last piece of data is the last year for which we actually have the accident reports.

The total IBNR (or whatever reserve we are calculating) is then the sum of these values for ech accident year

Chain ladder - further considerations

Issues to consider when looking at development triangles are:

Many issues we come across are similar to issues around handling data

There are many other methods which are variations on a theme of the chain ladder method:

Berquist Sherman

Adjust historic claims values to bring them into line with up to date claims handling practices

Curve fitting

Chain ladder is in fact a special case of curve fitting in which we fit the development factors exactly. More generally we could find a curve which was a close approximation to the actual development factors to be fitted

Trend analysis

Similar to curve fitting in that data can be cut into different cohorts and then any key features and trends can be analysed before recompiling back into a set of development factors or more general relationship between different development years

Bootstrapping

A helpful paper by Julian Lowe; A practical Guide To Measuring Reserve Variability Using: Bootstrapping, Operational Time and a Distribution Free Approach can be found here

Full password protected spreadsheet

Bootstrapping involves using the historic claims data to simulate the uncertainty in the IBNR claims:

Calculation Steps

Bootstrapping is best understood by actually doing it in a spreadsheet (as per e-lecture)

Reconciliation with the alpha beta method

The other method for calculating the smoothed triangle forms the smoothed incremental triangle from multiplying $\alpha$ and $\beta$ together

We can see below that $\beta_1$ is set to be 1.00

Smoothed (Incremental)
$1.00$$\beta_2$$\beta_3$$\beta_4$
$\alpha_1$$\alpha_1 \times 1.00$$\alpha_1 \times \beta_2$$\alpha_1 \times \beta_3$$\alpha_1 \times \beta_4$
$\alpha_2$$\alpha_2 \times 1.00$$\alpha_2 \times \beta_2$$\alpha_2 \times \beta_3$$\alpha_2 \times \beta_4$
$\alpha_3$$\alpha_3 \times 1.00$$\alpha_3 \times \beta_2$$\alpha_3 \times \beta_3$$\alpha_3 \times \beta_4$
$\alpha_4$$\alpha_4 \times 1.00$$\alpha_4 \times \beta_2$$\alpha_4 \times \beta_3$$\alpha_4 \times \beta_4$

Denoting the cumulative claims for accident year a, development year d as: $C_{a,d}$

For a 4 cell triangle we have that:

$\alpha_1 = \frac{C_{1,4}}{f_{1,2} \times f_{2,3} \times f_{3,4}}$

and $\alpha_2 = \frac{C_{2,3}}{f_{1,2} \times f_{2,3} }$ etc

Now the $\beta$s which are a little harder

$\alpha_1 \times \beta_2 = \frac{C_{1,4}}{f_{2,3} \times f_{3,4}} - \alpha_1$ because this is an incremental value

$\therefore \alpha_1 \times \beta_2 = \alpha_1 \times f_{1,2} - \alpha_1 = \alpha_1 \times (f_{1,2} - 1)$

$\therefore \beta_2 = f_{1,2} - 1$

For $\beta_3$:

We have that

$\alpha_1 \times \beta_3 = \frac{C_{1,4}}{ f_{3,4}} - \alpha_1 - \alpha_1 \times \beta_2$ again because this is an incremental value

$\therefore \alpha_1 \times \beta_3 = \alpha_1 \times f_{1,2} \times f_{2,3} - \alpha_1 - \alpha_1 \times \beta_2$

$\therefore \beta_3 = f_{1,2} \times f_{2,3} -1 - \beta_2$

$\therefore \beta_3 = f_{1,2} \times f_{2,3} - f_{1,2}$ or $ \beta_3 = f_{1,3} - f_{1,2}$

From which we see the general formula for $\beta$ which is:

$ \beta_n = f_{1,n} - f_{1,n-1}$

Bootstrapping the ODP (over-dispersed Poisson model)

The over-dispersed Poisson model is a variant on standard bootstrapping in which instead of taking the difference between the actual values and the fitted values we take the ratio of the actual to fitted values

These ratios are then randomly arranged around the run-off triangle and then multiplied back into the fitted values to produce a randomised run-off triangle

As with standard bootstrapping this process is then repeated thousands of times to produce a distribution of returns